Skip to content

exit gracefully with large file#17

Merged
m-y-mo merged 2 commits intomainfrom
large_file
Dec 11, 2025
Merged

exit gracefully with large file#17
m-y-mo merged 2 commits intomainfrom
large_file

Conversation

@m-y-mo
Copy link
Copy Markdown
Contributor

@m-y-mo m-y-mo commented Dec 11, 2025

To avoid large token limit error when fetching file.

Copilot AI review requested due to automatic review settings December 11, 2025 14:15
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a safety check to prevent large files from being fetched entirely, which could cause token limit errors. The implementation adds a 1000-line threshold before returning file content.

  • Adds a size check that returns an error message for files exceeding 1000 lines
  • Directs users to use the get_file_lines tool for large files instead

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if not source_path or not source_path.exists():
return f"Invalid {owner} and {repo}. Check that the input is correct or try to fetch the repo from gh first."
lines = get_file(source_path, path)
if len(lines) > 1000:
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 1000 should be extracted as a named constant at the module level. This makes the limit configurable and self-documenting. Consider defining something like MAX_FILE_LINES = 1000 at the top of the file with the other constants.

Copilot uses AI. Check for mistakes.
Comment on lines +109 to +110
if len(lines) > 1000:
return f"File {path} in {owner}/{repo} is too large to display ({len(lines)} lines). Please fetch specific lines using get_file_lines tool."
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size check happens after get_file has already loaded all lines into memory (line 108). For very large files, this could still cause memory issues or performance problems before the check is performed. Consider checking the file size or line count before fully loading the content, or use a streaming approach that stops reading once the limit is exceeded.

See below for a potential fix:

    # Stream file lines to avoid loading large files into memory
    lines = []
    found = False
    filename = strip_leading_dash(path)
    with zipfile.ZipFile(source_path) as z:
        for entry in z.infolist():
            if entry.is_dir():
                continue
            if remove_root_dir(entry.filename) == filename:
                found = True
                with z.open(entry, 'r') as f:
                    for i, line in enumerate(f):
                        if i >= 1000:
                            return f"File {path} in {owner}/{repo} is too large to display ({i} lines). Please fetch specific lines using get_file_lines tool."
                        lines.append(f"{i+1}: {line.rstrip()}")
                break
    if not found:
        return f"Unable to find file {path} in {owner}/{repo}"

Copilot uses AI. Check for mistakes.
@m-y-mo m-y-mo merged commit 29c6c2d into main Dec 11, 2025
9 checks passed
@m-y-mo m-y-mo deleted the large_file branch December 11, 2025 16:04
kiandadban pushed a commit to kiandadban/seclab-taskflows that referenced this pull request May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants